You're out of free questions.

Upgrade now

You created a game that is more popular than Angry Birds.

Each round, players receive a score between 0 and 100, which you use to rank them from highest to lowest. So far you're using an algorithm that sorts in O(nlgn)O(n\lg{n}) time, but players are complaining that their rankings aren't updated fast enough. You need a faster sorting algorithm.

Write a function that takes:

  1. a list of unsorted_scores
  2. the highest_possible_score in the game

and returns a sorted list of scores in less than O(nlgn)O(n\lg{n}) time.

For example:

  unsorted_scores = [37, 89, 41, 65, 91, 53]
HIGHEST_POSSIBLE_SCORE = 100

# Returns [91, 89, 65, 53, 41, 37]
sort_scores(unsorted_scores, HIGHEST_POSSIBLE_SCORE)

We’re defining nn as the number of unsorted_scores because we’re expecting the number of players to keep climbing.

And, we'll treat highest_possible_score as a constant instead of factoring it into our big O time and space costs because the highest possible score isn’t going to change. Even if we do redesign the game a little, the scores will stay around the same order of magnitude.

Gotchas

Multiple players can have the same score! If 10 people got a score of 90, the number 90 should appear 10 times in our output list.

We can do this in O(n)O(n) time and space.

Breakdown

O(nlgn)O(n\lg{n}) is the time to beat. Even if our list of scores were already sorted we'd have to do a full walk through the list to confirm that it was in fact fully sorted. So we have to spend at least O(n)O(n) time on our sorting function. If we're going to do better than O(nlgn)O(n\lg{n}), we're probably going to do exactly O(n)O(n).

What are some common ways to get O(n)O(n) runtime?

One common way to get O(n)O(n) runtime is to use a greedy algorithm.

A greedy algorithm builds up a solution by choosing the option that looks the best at every step.

Say you're a cashier and need to give someone 67 cents (US) using as few coins as possible. How would you do it?

Whenever picking which coin to use, you'd take the highest-value coin you could. A quarter, another quarter, then a dime, a nickel, and finally two pennies. That's a greedy algorithm, because you're always greedily choosing the coin that covers the biggest portion of the remaining amount.

Some other places where a greedy algorithm gets you the best solution:

  • Trying to fit as many overlapping meetings as possible in a conference room? At each step, schedule the meeting that ends earliest.
  • Looking for a minimum spanning tree in a graph? At each step, greedily pick the cheapest edge that reaches a new vertex.

Careful: sometimes a greedy algorithm doesn't give you an optimal solution:

Validating that a greedy strategy always gets the best answer is tricky. Either prove that the answer produced by the greedy algorithm is as good as an optimal answer, or run through a rigorous set of test cases to convince your interviewer (and yourself) that its correct.

But in this case we're not looking to just grab a specific value from our input set (e.g. the "largest" or the "greatest difference")—we're looking to reorder the whole set. That doesn't lend itself as well to a greedy approach.

Another common way to get O(n)O(n) runtime is to use counting.

Counting is a common pattern in time-saving algorithms. It can often get you O(n)O(n) runtime, but at the expense of adding O(n)O(n) space.

The idea is to define a dictionary or list (call it e.g. counts) where the keys/indices represent the items from the input set and the values represent the number of times the item appears. In one pass through the input you can fully populate counts:

  counts = {}
for item in the_list:
    if item in counts:
        counts[item] += 1
    else:
        counts[item] = 1

Once you know how many times each item appears, it's trivial to:

  • generate a sorted list
  • find the item that appears the most times
  • etc.
We can build a list score_counts where the indices represent scores and the values represent how many times the score appears. Once we have that, can we generate a sorted list of scores?

What if we did an in-order walk through score_counts. Each index represents a score and its value represents the count of appearances. So we can simply add the score to a new list sorted_scores as many times as count of appearances.

Solution

We use counting sort.

  def sort_scores(unsorted_scores, highest_possible_score):
    # List of 0s at indices 0..highest_possible_score
    score_counts = [0] * (highest_possible_score+1)

    # Populate score_counts
    for score in unsorted_scores:
        score_counts[score] += 1

    # Populate the final sorted list
    sorted_scores = []

    # For each item in score_counts
    for score in range(len(score_counts) - 1, -1, -1):
        count = score_counts[score]

        # For the number of times the item occurs
        for time in range(count):
            # Add it to the sorted list
            sorted_scores.append(score)

    return sorted_scores

Complexity

O(n)O(n) time and O(n)O(n) space, where nn is the number of scores.

Wait, aren't we nesting two loops towards the bottom? So shouldn't it be O(n2)O(n^2) time? Notice what those loops iterate over. The outer loop runs once for each unique number in the list. The inner loop runs once for each time that number occurred.

So in essence we're just looping through the nn numbers from our input list, except we're splitting it into two steps: (1) each unique number, and (2) each time that number appeared.

Here's another way to think about it: in each iteration of our two nested loops, we append one item to sorted_scores. How many numbers end up in sorted_scores in the end? Exactly how many were in our input list! nn.

If we didn't treat highest_possible_score as a constant, we could call it kk and say we have O(n+k)O(n+k) time and O(n+k)O(n+k) space.

Bonus

Note that by optimizing for time we ended up incurring some space cost! What if we were optimizing for space?

We chose to generate and return a separate, sorted list. Could we instead sort the list in place? Does this change the time complexity? The space complexity?

Do you have an answer?

Wanna review this one again later? Or do you feel like you got it all?

Mark as done Pin for review later
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import unittest
def sort_scores(unsorted_scores, highest_possible_score):
# Sort the scores in O(n) time
return []
# Tests
class Test(unittest.TestCase):
def test_no_scores(self):
actual = sort_scores([], 100)
expected = []
self.assertEqual(actual, expected)
def test_one_score(self):
actual = sort_scores([55], 100)
expected = [55]
self.assertEqual(actual, expected)
def test_two_scores(self):
actual = sort_scores([30, 60], 100)
expected = [60, 30]
self.assertEqual(actual, expected)
def test_many_scores(self):
actual = sort_scores([37, 89, 41, 65, 91, 53], 100)
expected = [91, 89, 65, 53, 41, 37]
self.assertEqual(actual, expected)
def test_repeated_scores(self):
actual = sort_scores([20, 10, 30, 30, 10, 20], 100)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Reset editor

Powered by qualified.io

. . .